~rror Back-propagation to Phonetic Classification
نویسندگان
چکیده
This paper is concerced with the use of error back-propagation in phonetic classification. Our objective is to investigate the basic characteristics of back-propagation, and study how the framework of multi-layer perceptrons can be exploited in phonetic recognition. We explore issues such as integration of heterogeneous sources of information, conditioll~ that can affect performance of phonetic classification, internal representations, comparisons with traditional pattern classification techniques, comparisons of different error metrics, and initialization of the network. Our investigation is performed within a set of experiments that attempts to recognize the 16 vowels in American English independent of speaker. Our results are comparable to human performance. Early approaches in phonetic recognition fall into two major extremes: heuristic and algorithmic. Both approaches have their own merits and shortcomings. The heuristic approach has the intuitive appeal that it focuses on the linguistic information in the speech signal and exploits acoustic-phonetic knowledge. HO'fever, the weak control strategy used for utilizing our knowledge has been grossly inadequate. At the other extreme, the algorithmic approach relies primarily on the powerful control strategy offered by well-formulated pattern recognition techniques. However, relatively little is known about how our speech knowledge accumulated over the past few decades can be incorporated into the well-formulated algorithms. We feel that artificial neural networks (ANN) have some characteristics that can potentially enable them to bridge the gap between these two extremes. On the one hand, our speech knowledge can provide guidance to the structure and design of the network. On the other hand, the self-organizing mechanism of ANN can provide a control strategy for utilizing our knowledge. In this paper, we extend our earlier work on the use of artificial neural networks for phonetic recognition [2]. Specifically, we focus our investigation on the following sets of issues. First, we describe the use of the network to integrate heterogeneous sources of information. We will see how classification performance improves as more Error Back-Propagation to Phonetic Classification 207 information is available. Second, we discuss several important factors that can substantially affect the performance of phonetic classification. Third, we examine the internal representation of the network. Fourth, we compare the network with two traditional classification techniques: K-nearest neighbor and Gaussian classification. Finally, we discuss our specific implementations of back-propagation that yield improved performance and more efficient learning time.
منابع مشابه
Speaker independent bimodal phonetic recognition experiments
A speaker independent bimodal phonetic classification experiment regarding the Italian plosive consonants is described. The phonetic classification scheme is based on a feed forward recurrent back-propagation neural network working on audio and visual information. The speech signal is processed by an auditory model producing spectral-like parameters, while the visual signal is processed by a sp...
متن کاملPhonetic recognition by recurrent neural networks working on audio and visual information
A phonetic classification scheme based on a feed forward recurrent back-propagation neural network working on audio and visual information is described. The speech signal is processed by an auditory model producing spectral-like parameters, while the visual signal is processed by a specialised hardware, called ELITE, computing lip and jaw kinematics parameters. Some results will be given for va...
متن کاملPhonetic classification of timit segments preprocessed with lyon's cochlear model using a supervised/unsupervised hybrid neural network
We report results on vowel and stop consonant recognition with tokens extracted from the TIMIT database. Our current system diiers from others doing similar tasks in that we do not use any speciic time normalization techniques. We use a very detailed biologically motivated input representation of the speech tokens-Lyon's cochlear model as implemented by Slaney 20]. This detailed, high dimension...
متن کاملA Comparison of Graph Construction and Learning Algorithms for Graph-Based Phonetic Classification
Graph-based semi-supervised learning (SSL) algorithms have been widely applied in large-scale machine learning. In this work, we show different graph-based SSL methods (modified adsorption, measure propagation, and prior-based measure propagation) and compare them to the standard label propagation algorithm on a phonetic classification task. In addition, we compare 4 different ways of construct...
متن کاملClassification of ECG signals using Hermite functions and MLP neural networks
Classification of heart arrhythmia is an important step in developing devices for monitoring the health of individuals. This paper proposes a three module system for classification of electrocardiogram (ECG) beats. These modules are: denoising module, feature extraction module and a classification module. In the first module the stationary wavelet transform (SWF) is used for noise reduction of ...
متن کامل